Robust Regression with Asymmetric Heavy-Tail Noise Distributions

نویسندگان

  • Ichiro Takeuchi
  • Yoshua Bengio
  • Takafumi Kanamori
چکیده

In the presence of a heavy-tail noise distribution, regression becomes much more difficult. Traditional robust regression methods assume that the noise distribution is symmetric, and they downweight the influence of so-called outliers. When the noise distribution is asymmetric, these methods yield biased regression estimators. Motivated by data-mining problems for the insurance industry, we propose a new approach to robust regression tailored to deal with asymmetric noise distribution. The main idea is to learn most of the parameters of the model using conditional quantile estimators (which are biased but robust estimators of the regression) and to learn a few remaining parameters to combine and correct these estimators, to minimize the average squared error in an unbiased way. Theoretical analysis and experiments show the clear advantages of the approach. Results are on artificial data as well as insurance data, using both linear and neural network predictors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Wavelet-based Image Denoising using Scale Mixture of Normal Distributions with Adaptive Parameter Estimation

Removing noise from images is a challenging problem in digital image processing. This paper presents an image denoising method based on a maximum a posteriori (MAP) density function estimator, which is implemented in the wavelet domain because of its energy compaction property. The performance of the MAP estimator depends on the proposed model for noise-free wavelet coefficients. Thus in the wa...

متن کامل

The Family of Scale-Mixture of Skew-Normal Distributions and Its Application in Bayesian Nonlinear Regression Models

In previous studies on fitting non-linear regression models with the symmetric structure the normality is usually assumed in the analysis of data. This choice may be inappropriate when the distribution of residual terms is asymmetric. Recently, the family of scale-mixture of skew-normal distributions is the main concern of many researchers. This family includes several skewed and heavy-tailed d...

متن کامل

Hessian Stochastic Ordering in the Family of multivariate Generalized Hyperbolic Distributions and its Applications

In this paper, random vectors following the multivariate generalized hyperbolic (GH) distribution are compared using the hessian stochastic order. This family includes the classes of symmetric and asymmetric distributions by which different behaviors of kurtosis in skewed and heavy tail data can be captured. By considering some closed convex cones and their duals, we derive some necessary and s...

متن کامل

Tail Probabilities for Regression Estimators

Estimators of regression coefficients are known to be asymptotically normally distributed, provided certain regularity conditions are satisfied. In small samples and if the noise is not normally distributed, this can be a poor guide to the quality of the estimators. The paper addresses this problem for small and medium sized samples and heavy tailed noise. In particular, we assume that the nois...

متن کامل

The Challenge of Non-Linear Regression on Large Datasets with Asymmetric Heavy Tails

Regression becomes unstable under a heavy-tail error distribution due to dominant effects of outliers. Traditional robust estimators are helpful under symmetric error, by reducing the effect of outliers equally from both sides of the distribution. Under asymmetric error, however, those estimators are biased because the outliers appear only one side of the distribution. Motivated by datamining p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Neural computation

دوره 14 10  شماره 

صفحات  -

تاریخ انتشار 2002